From Relational Data to Graphs: Inferring Significant Links Using Generalized Hypergeometric Ensembles
نویسندگان
چکیده
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.
منابع مشابه
Multiplex Network Regression: How do relations drive interactions?
We introduce a statistical method to investigate the impact of dyadic relations on complex networks generated from repeated interactions. It is based on generalised hypergeometric ensembles, a class of statistical network ensembles developed recently. We represent different types of known relations between system elements by weighted graphs, separated in the different layers of a multiplex netw...
متن کاملRepresentations and Ensemble Methods for Dynamic Relational Classification
Temporal networks are ubiquitous and evolve over time by the addition, deletion, and changing of links, nodes, and attributes. Although many relational datasets contain temporal information, the majority of existing techniques in relational learning focus on static snapshots and ignore the temporal dynamics. We propose a framework for discovering temporal representations of relational data to i...
متن کاملGeneralized Hypergeometric Ensembles: Statistical Hypothesis Testing in Complex Networks
Statistical ensembles of networks, i.e., probability spaces of all networks that are consistent with given aggregate statistics, have become instrumental in the analysis of complex networks. Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis...
متن کاملAttribute-oriented Induction Using Domain Generalization Graphs
Howard J. Hamilton, Robert J. Hilderman, and Nick Cercone Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 fhamilton,hilder,[email protected] Abstract Attribute-oriented induction summarizes the information in a relational database by repeatedly replacing speci c attribute values with more general concepts according to user-de ned concept hierarchies. ...
متن کاملAssociation Rule Mining of Relational Data
Most data of practical relevance are structured in more complex ways than is assumed in traditional data mining algorithms, which are based on a single table. The concept of relations allows for discussing many data structures such as trees and graphs. Relational data have much generality and are of significant importance, as demonstrated by the ubiquity of relational database management system...
متن کامل